Flexible Turn-Taking for Spoken Dialog Systems
نویسندگان
چکیده
Even as progress in speech technologies and task and dialog modeling has allowed the development of advanced spoken dialog systems, the low-level interaction behavior of those systems remains often rigid and inefficient. The goal of this thesis, is to provide a framework and models to endow spoken dialog systems with robust and flexible turn-taking abilities. To this end, we designed a new dialog system architecture that combines a high-level Dialog Manager (DM) with a low-level Interaction Manager (IM). While the DM operates on user and system turns, the IM operates at the sub-turn level, acting as the interface between the real time information of sensors and actuators, and the symbolic information of the DM. In addition, the IM controls reactive behavior, such as interrupting a system prompt when the user barges in. We propose two approaches to control turn-taking in the IM. First, we designed an optimization method to dynamically set the pause duration threshold used to detect the end of user turns. Using a wide range of dialog features, this algorithm allowed us to reduce average system latency by as much as 22% over a fixed-threshold baseline, while keeping the detection error rate constant. Second, we proposed a general, flexible model to control the turn-taking behavior of conversational agents. This model, the Finite-State Turn-Taking Machine (FSTTM), builds on previous work on 6-state representations of the conversational floor and extends them in two ways. First, it incorporates the notion of turn-taking action (such as grabbing or releasing the floor) and of state-dependent action cost. Second, it models the uncertainty that comes from imperfect recognition of user’s turn-taking intentions. Experimental results show that this approach performs significantly better than the threshold optimization method for end-of-turn detection, with latencies up to 40% shorter than a fixed-threshold baseline. We also applied the FSTTM model to the problem of interruption detection, which reduced detection latency by 11% over a strong heuristic baseline. The architecture as well as all the models proposed in this thesis were evaluated on the CMU Let’s Go bus information system, a publicly available telephone-based dialog system that provides bus schedule information to the Pittsburgh population.
منابع مشابه
Multiparty Turn Taking in Situated Dialog: Study, Lessons, and Directions
We report on an empirical study of a multiparty turn-taking model for physically situated spoken dialog systems. We present subjective and objective performance measures that show how the model, supported with a basic set of sensory competencies and turn-taking policies, can enable interactions with multiple participants in a collaborative task setting. The analysis brings to the fore several p...
متن کاملAn Incremental Turn-Taking Model with Active System Barge-in for Spoken Dialog Systems
This paper deals with an incremental turntaking model that provides a novel solution for end-of-turn detection. It includes a flexible framework that enables active system barge-in. In order to accomplish this, a systematic procedure of teaching a dialog system to produce meaningful system barge-in is presented. This procedure improves system robustness and success rate. It includes constructin...
متن کاملComputational Models for Multiparty Turn-Taking
We describe a computational framework for modeling and managing turn-taking in openworld spoken dialog systems. We present a representation and methodology for tracking the conversational dynamics in multiparty interactions, making floor control decisions, and rendering these decisions into appropriate behaviors. We show how the approach enables an embodied conversational agent to participate i...
متن کاملA Finite-State Turn-Taking Model for Spoken Dialog Systems
This paper introduces the Finite-State TurnTaking Machine (FSTTM), a new model to control the turn-taking behavior of conversational agents. Based on a non-deterministic finite-state machine, the FSTTM uses a cost matrix and decision theoretic principles to select a turn-taking action at any time. We show how the model can be applied to the problem of end-of-turn detection. Evaluation results o...
متن کاملOptimizing End-of-Turn Detection for Spoken Dialog Systems
This paper presents an overview of our previously published work on the problem of end of turn detection in spoken dialog systems, which consists in determining whether the user has completed their turn as they are speaking it. Over the past few years, we designed two new models that exploit contextual features to significantly reduce system latency at the end of user turns without increasing t...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008